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Abstract. We study a notion of robustness of a Markov kernel that describes a 
system of several input random variables and one output random variable. Ro- 
bustness requires that the behaviour of the system does not change if one or 
several of the input variables are knocked out. If the system is required to be 
robust against too many knockouts, then the output variable cannot distinguish 
reliably between input states and must be independent of the input. We study 
how many input states the output variable can distinguish as a function of the 
required level of robustness. 

Gibbs potentials allow a mechanistic description of the behaviour of the sys- 
tem after knockouts. Robustness imposes structural constraints on these poten- 
tials. We show that interaction families of Gibbs potentials allow to describe 
robust systems. 

Given a distribution of the input random variables and the Markov kernel 
describing the system, we obtain a joint probability distribution. Robustness im- 
plies a number of conditional independence statements for this joint distribution. 
The set of all probability distributions corresponding to robust systems can be 
decomposed into a finite union of components, and we find parametrizations of 
the components. The decomposition corresponds to a primary decomposition of 
the conditional independence ideal and can be derived from more general results 
about generalized binomial edge ideals. 



1. Introduction 

Consider a stochastic system of n input nodes and one output node: 
input: X\ X 2 X 3 X n 

system 

output: 

As shown in |T], there are two ingredients to robustness: 

(1) If one or several of the input nodes are removed, the system behaviour 
should not change too much ("small exclusion dependence"). 

(2) A causal contribution of the input nodes on the output nodes. 

The second point is strictly necessary: If the behaviour of the output does not 
depend on the inputs at all, then it is usually not affected by a knockout of a subset 
of the inputs, but this exclusion independence is trivial. 

In this paper we do not use the information theoretic measures proposed in HI. 
Instead, we start with a simple model of exclusion independence: We study systems 
in which the behaviour of the output node does not change when one or more of the 
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input nodes are knocked out. We formalize our robustness requirements in terms of 
a robustness specification H, which consists of pairs (R, xr), where R is a subset of 
the inputs and xr is a joint state of the inputs in R. Let S be a set of possible states 
of the input nodes. The system is ^-robust in S, if the behaviour of the system 
does not change if the inputs not in R are knocked out, provided that the inputs in 
R are in the state xr and the current state of all inputs belongs to S. 

If the robustness specification % is too large, or if the set S is too large, then 
in any ^-robust system the output does not depend on the input at all. In general, 
the behaviour of the system is restricted by robustness requirements. Therefore, to 
study the causal contribution of the input nodes on the output nodes, we investigate 
how varied the behaviour of a system can be, given both H and S. More precisely, 
robustness specifications imply that the system cannot distinguish all input states, 
and we may ask how many states the system can discern. This question is related 
to the topic of error detecting codes, see Remark[6] 

This paper is organized as follows: Section [2] contains our basic setting and defi- 
nitions. We find several equivalent formulations of our notion of robustness. More- 
over, we study the question how many states an ^-robust system can distinguish. 
Section [3] shows that our definitions generalize the notions of canalyzing and 
nested canalyzing functions [8], which have been studied before in the context of 
robustness. Section @]proposes to model the different behaviours of a system under 
various knockouts using a family of Gibbs potentials. Robustness implies various 
constraints on these potentials. Section |4] discusses the probabilistic behaviour of 
the whole system, including its inputs, when the input variables are distributed to 
some fixed input distribution. The set of all joint probability distributions is found 
such that the system is 7?-robust for all input states with non-vanishing probability. 

Some of our results in Section [5] can also be derived from recent algebraic re- 
sults in |[T3l about generalized binomial ideals. These ideals generalize the bino- 
mial edge ideals of [6] and Ifl2l . Similar ideals have recently been studied in the 
paper [ 14 ], which discusses what we call (n - l)-robustness in Section[6l In this pa- 
per we give self-contained proofs that are also accessible to readers not acquainted 
to the language of commutative algebra. We comment on the relation to the alge- 
braic results in Remark [25] 

2. Robustness and canalyzing functions 

We consider n input nodes, denoted by 1,2, ... ,n, and one output node, denoted 
by 0. For each i = 0, 1, . . . , n the state of node i is a discrete random variable X, 
taking values in the finite set Xj of cardinality d{. The input state space is the set 
X.m = X\ X • • • X X n , and the joint state space is X - Xq x Xi n . For any subset 
S c {0, . . . , n} write X$ for the random vector (X,) ;e 5 ; then X$ is a random variable 
with values in Xs = X, e s Xj. For S = [n] := {1,. . . ,n) we also write X m instead 
of X[ n ]. For any x € X, the restriction of x to a subset S c {0, . . . ,n) is the vector 
x\s € Xs with (x\s )i = Xj for all i € S . In contrast, the notation x$ will refer to an 
arbitrary element of Xs ■ 

As a model for the computation of the output from the input, we use a stochastic 
map (Markov kernel) k from X m to ,Yo, that is, k is a function that assigns to each 
x e Xi n a probability distribution k(x) for the output Xo. Such a stochastic map k 
can be represented by a matrix, with matrix elements k(x; xq), x € Xi n , xo e Xo, 
satisfying Yjx eX K ( x > x o) - 1 f° r au x € ^m- For each x € X m the probability 
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distribution k{x) models the behaviour of Xq when the input variables are in the 
state x. When the input is distributed according to some input distribution p m , then 
the joint distribution p of input and output variables satisfies 

p(X = xq, X in = x) = p in (X in = x)k(x; xq) . 

If Pm(Xi n = x) > 0, then k(x) can be computed from the joint probability distribu- 
tion p and equals the conditional distribution of Xq, given that X m = x. 

When a subset S of the input nodes is knocked out and only the nodes in R = [n] \ 
S remain, then the behaviour of the system changes. Without further assumptions, 
the post-knockout function is not determined by k and has to be specified separately. 
We model the post-knockout function by a further stochastic map kr : Xr x <Yo — > 
[0, 1]. A complete specification of the system is given by the family (/oOac[«] of all 
possible post-knockout functions, which we refer to as functional modalities. As a 
shorthand notation we denote functional modalities by (ka). The stochastic map k 
itself, which describes the normal behaviour of the system without knockouts, can 
be identified with . 

What does it mean for functional modalities to be robust? Assume that the input 
is in state x, and that we knock out a set S of inputs. Denoting the remaining set of 
inputs by R, we say that (ka) is robust in x against knockout of S , if tc(x) = kr(x\r), 
that is, if 

(1) k(x; xq) = kr(x\r, xq) for all xq e Xq . 

Let % be a collection of pairs (R, xr), where R c [n] and xr e Xr. We call such a 
collection a robustness specification in the following. We say that (ka) is %-robust 
in a set S c X m if 

(2) k(x) = kr(x\r), whenever x e S and (R, x\r) e K . 
The main example in this section will be the robustness structures 

Kk := {(R, x R ) : R c [n], \R\ >k,x R e X R ] . 

Equation ([TJ only compares the functional modality kr after knockout with the 
stochastic map k that describes the regular behaviour of the unperturbed system. 
In particular, for R Q R' Q [n], the functional modality kr> is in no way restricted 
by (Q]). Therefore, it may happen that a system that is not robust against a knockout 
of a set 5" = [n] \ R' recovers its regular behaviour if we knockout even more 
nodes. However, this is not the typical situation. Therefore, it is natural to assume 
that the following holds: If (R,x R ) € K and if R c R' c [n], then (R',x R >) e K 
for all xr: e Xr> with xr>\r = xr. In this case we call the robustness specification 
*R coherent. For example the robustness structures Kk are coherent. The notion 
of coherence will not play an important role in the following, but it is interesting 
from a conceptual point of view. It is related to the notion of coherency as used 
e.g. in 0. 

By definition, for robust functional modalities (ka) the largest functional modal- 
ity K[ n ] determines the smaller ones in the relevant points via ©. This motivates 
the following definition: A stochastic map k is called %-robust in S, if there exist 
functional modalities (ka) with k = K[„] that are ^-robust in S. More directly, k is 
^-robust in S if and only if 



k(x) = K(y), 



whenever x,y e S, x\r = y\R and (R, x\r) e % . 
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Figure 1. An illustration of Example Q] with n = 4. a) The 
graph G<r 3 . b) An induced subgraph G<r 3> s- c) The connected com- 
ponents of Gr 3 s- In fact, in this example both connected compo- 
nents are cylinder sets, d) The induced subgraph Gft 2 $, which is 
connected. 



When studying robustness of a stochastic map k we may always assume that R. 
is coherent; for if x\r = y\g implies k(x) - k(j), then x\r> = y\R> also implies 
k(x) = K(y), whenever 7? c 7?' c [n]. 
For any subset 7? c [n] and xr e Xr let 

C(R,x R ) :- {xeX in : x\ R = x R }. 

be the corresponding cylinder set. Then k is "R-robust in S if and only if k{x) = K(y) 
for all x, y e S n C(/?, and (R, xr) e 7?. In other words, the stochastic map k is 
constant on S n C(/?, x«) for all (/?, x«) e 

The following construction is useful to study robust functional modalities: Given 
a robustness specification R, define a graph G% on X m by connecting two elements 
x,y € Xi n by an edge if there is (R, xr) e % such that x| R = y\ R - xr. Denote by 
Gft£ the subgraph of G-r induced by S. 

Example 1. Assume that Xi = {0, 1} for i = I,..., n. Then the input state space 
<^in = {0, 1}" can be identified with the vertices of an n-dimensional hypercube. 
The graph G<R n _^ is the edge graph of this hypercube (Fig.[TJi)). Cylinder sets corre- 
spond to faces of this hypercube. If R c [n] has cardinality n - 1 , then the cylinder 
set CCR, xr) is an edge, if R has cardinality n—2, then CCR, xr) is a two-dimensional 
face. Fig. Q})) shows an induced subgraph of G3 for n - 4. By comparison, the 
graph G'R„_ 2 has additional edges corresponding to diagonals in the quadrangles 
of Gft„_, . For example, the set of vertices marked black in Figure Q})) is connected 
in G K _ 2 , but not in Gr^ (Fig. [ljl)). 

Proposition 2. The following statements are equivalent for a stochastic map k: 

(1) k is 'R-rohust in S. 

(2) k is constant on S n C(R, XR)for all (R, xr) € K. 

(3) k is constant on the connected components ofG<R s- 

(4) For any probability distribution p; n ofX{ n with Pi n (S) = 1 and for all (R, xr) e 
%, the output Xq is stochastically independent ofX[ n ]\R given Xr = xr. 

Proof. The equivalence (1) <=> (2) was already shown. 

(2) <=> (3): Condition (2) says that k is constant along each edge of Gr$- By it- 
eration this implies (3). In the other direction, the subgraph of G^s induced by 
S n C(R, xr) is connected for all (R, xr) e %, and therefore (3) implies (2). 
(2) => (4): For any x € Xi n with p m {x) > 0, the conditional distribution of the 
output given the input satisfies p(Xo = xq |Xj n = x) = k(x;xq). By (2), k{x;xq) is 
constant on CCR, x\r) n S. Hence the conditional distribution does not depend on 
X [n] \ R , and so p(X = x \ X in = x) = p(X = x Q \ X R = x\ R ). 
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(4) => (2): Let p m be the uniform distribution on S (or any other probability distri- 
bution with support S), and fix (H, xr) e H. By assumption, for any x e S with 
Ar = xr, the conditional distribution p(Xo = xq \ X m - x) - k(x; xo) does not 
depend on x\[ n -]\s . Therefore, k{x) is constant on S n CCR; xr). □ 

The choice of the set S is important: On one hand S should be large, because 
otherwise the notion of robustness is very weak. However, if S is too large, then 
the equations (Q]) imply that the output Xo is (unconditionally) independent of all 
inputs. Proposition |2] gives a hint how to choose the set S: The goal is to have 
as many connected components as possible in Grs- This motivates the following 
definition: 

Definition 3. For any subset S c Xi D , the set of connected components of G<r z s is 
called an K-robustness structure. 

Let B be an ^-robustness structure, and let S - UB. Let /b : S — » B be the 
map that maps each x e S to the corresponding block of B containing x. Any 
stochastic map k that is ^-robust on S factorizes through fe, in the sense that there 
is a stochastic map k' that maps each block in B to a probability distribution on Xq 
and that satisfies k = k! o fy. Conversely, any stochastic map k that factorizes 
through /b is ^-robust. 

To any joint probability distribution p m on X m with p{X m e S) = 1 we can 
associate a random variable B = /b(X\, . . . ,X n ). If k is ^-robust on S, then Xo is 
independent of X\ , , . . , X n given B. Note that the random variable B is only defined 
on UB, which is a set of measure one with respect to p m . The situation is illustrated 
by the following graph: 




fY>(X\,Xi, . . - ,X n ) 



Y 

When the robustness specification H is fixed, how much freedom is left to 
choose a robust stochastic map k! More precisely, how many components can 
an ^-robustness structure B have? 

Lemma 4. Let B be a robustness structure of the robustness specification K. Let 
R c [n], S =[n]\R and J/ s - {x R e Xr : (R, x R ) e K}. Then 

n<\y R \ + \x R \y R \-\x s \. 

Proof. The set S is the disjoint union of the sets C(R, xr) n S for xr e }/r and 
the \Xr \ • \Xs \ singletons {x} c S with x\r £ Each of these sets induces a 
connected subgraph of G<r. The statement now follows from Proposition [2 □ 

Example 5. Suppose that S = Xi n . This means that any ^-robustness structure B 
satisfies UB = Xi„. If Gr is connected, then B has just a single block. In this case 
the bound of Lemma|4]is usually not tight. On the other hand, the bound is tight if 
<R = {(R,x R ):x R eX R }. 
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Remark 6 (Relation to coding theory). Assume that all di are equal. We can inter- 
pret X m as the set of words of length n over the alphabet \d{\. Consider the uniform 
case H = H^. Then the task is to find a collection of subsets such that any two dif- 
ferent subsets have Hamming distance at least n-k+l. A related problem appears 
in coding theory: A code is a subset J/ of X m and corresponds to the case that 
each element of B is a singleton. If distinct elements of the code have Hamming 
distance at least n - k + 1, then a message can be reliably decoded even if only k 
letters are transmitted. If all letters are transmitted, but up to k letters may contain 
an error, then this error may at least be detected; hence such codes are called error 
detecting codes. In this setting, the function /b can be interpreted as the decoding 
operation. The problem of finding a largest possible code such that all code words 
have a fixed minimum distance is also known as the sphere packing problem. The 
maximal size Aj^n, n — k + 1) of such a code is unknown in general. 



3. Canalyzing functions 

Our notion of ^-robust functional modalities naturally generalizes and is moti- 
vated by canalyzing [9] and nested canalyzing functions ifTUll . Let / : X m — > Xq 
be a function, also called (deterministic) map. Such a map can be considered as a 
special case of a stochastic map by identifying / with 



K f (x;x ) :-- 



1, if/(x) = x 
0, otherwise 



We say that / is (R, XR)-canalyzing, if the value of / does not depend on the input 
variables ^\ n \\R given that the input variables Xr are in state xr. In other words, an 
(R, .^-canalyzing function is assumed to be constant on the corresponding cylin- 
der set: 

x,x'eC(R,x R ) => f(x) = f(x'). 

Given a robustness specification H, we say that a function / is H-canalyzing if it 
is (R, .x#)-canaryzing for all (R, xr) € H. Clearly, the set of ^-canalyzing functions 
strongly depends on %. On one hand, any function is ^-canalyzing with respect to 

<R = {([«], x) : xeX in ). 

On the other hand, for two different elements i, j e [n], and 

H = ({i}xXdu(U}xXj), 

any ^-canalyzing function is constant. Note that constant functions are ^-canalyzing 
for any H. 

The following statement directly follows from Proposition |2j 

Proposition 7. A function f : Xi n — > Xo is 'R-canalyzing if and only if is K- 
robust in S = X[ n . 

Particular cases of canalyzing functions have been studied in the context of ro- 
bustness: 

Example 8. (1 ) Canalyzing functions. A function / with domain <Yi n is canalyzing 
in the sense of 0, if there exist an input node k € [ri], an input value a e X^, and 
an output value b e Xq such that the value of / is independent of X[ n ]\[k}, given that 
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x\k = a. In other words, f(x) - f(y) - b whenever x\k - y|* - a. A canalyzing 
function is ^-canalyzing with 

<R :- {(R,x R ) : R c [n], k € R, x R € ,Y R , XrI* - a} . 

(2) Nested canalyzing functions have been studied in [ 10]. A function / is nested 
canalyzing in the variable order X\,...,X n with canalyzing input values a\ e X\, 
...,a n e X„ and canalyzed output values b\,...,b n if / satisfies f(x) = b^ for all 
x e X satisfying x\u = at and x\i ± a,- for all / < k. Let <R := ft®, where 

<R (k) := : [*] c R, x R \i ±a u .. .,x R \ k -i * a k -i,x R \ k = fljt}, . 

It is easy to see that / is a nested canalyzing function if and only if it is %- 
canalyzing. 

The set of Boolean nested canalyzing functions has been described algebraically 
in (71 as a variety over the finite field F2. Here, we use a different viewpoint, which 
allows to study not only deterministic functions, but also stochastic functions. 



4. Robustness and Gibbs representation 

Let (ka) be a collection of functional modalities, as defined in Section |2] In- 
stead of providing a list of all functional modes ka, one can describe them in more 
mechanistic terms. To illustrate this, we first consider an example from the field of 
neural networks: Assume that the output node receives an input x — (xi, . . , , x n ) e 
{-1, +1}" and generates the output +1 with probability 

For an arbitrary output xq this implies 



(3) k(x\, , . . , x n ;xo) := 

The structure of this representation of the stochastic map k already suggests what 
the function should be after a knockout of a set S of input nodes: Simply remove 
the contribution of all the nodes in S . The post-knockout function is then given by 

e l(LmWXi-v)xo 

(4) K R (x R ;+l) := — : j , 

g+jffifcsWi*,— 17) + g-^CLieRWiXi-rj) 

where R = [n] \ S . These post-knockout functional modalities are based on the 
decomposition of the sum that appears in ([3]). 

More generally, we consider the following model of (ka): 

(5) k a (x a ;x ) 



y , e Y>Ba<l>B(XA\B,x' Q ) , 

where the tpB are functions on XbxXq. Such a sum decomposition of k is referred to 
as a Gibbs representation of k and contains more information than k itself. Clearly, 
each ka is strictly positive. Using the Mobius inversion, it is easy to see that each 
strictly positive family (ka) has a representation of the form ([5]) with 

(6) 4>a(xa,x ) := ^(-1) imc W(*aIc;* ). 

CQA 



s 
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Note that this representation is not unique: If an arbitrary function of xa (that does 
not depend on xq) is added to the function tpA, then the function ka, defined via (f5]), 
does not change. 

A single robustness constraint has the following consequences for the <pA- 

Proposition 9. Let S C [n] and R - [n] \ S, and let (ka) be strictly positive 
functional modalities with Gibbs potentials {<Pa)- Then (ka) is robust in x against 
knockout ofS if and only ifTiBQ[n],BgM 0b( x Ib> x o) does not depend on xq. 

Proof. Denote by (f>A the potentials defined via ©. Then (Q} is equivalent to 
4>b(x\b, x ) = ^ 0bWb, *o) <=> ^ 4>b(x\b, Xq) = 0. 

Bc[n] BCR BQ[n] 

B£R 

The statement follows from the fact that 4>b(x\b',xq) - 4>b(x\b',xq) is independent 
of xo (f° r fixed x). □ 

Example 10. Consider n = 2 binary inputs, X\ = X 2 - {0,1}, and let S = 
{(0, 0), (1, 1)}. Then 1-robustness on S means 

k {1] (xi;x ) = K {l2 )(xi,x 2 ;x Q ) = k {2 \(x 2 ;xq) 

for all xq whenever x\ = x 2 . By Proposition |9]this translates into the conditions 

(7) <P\\,2\(x u x 2 ;x ) + (f> {l] (xu x ) = = 4>{i, 2 )(xi,x 2 ;x Q ) + (f> {2] (x 2 ; x ) 

for all xq whenever x\ = x 2 for the potentials (0a) defined via ©. This means: 
Assuming that (ka) is 1 -robust, it suffices to specify the four functions 

(.*o), <f> { i)(xi;x ), 0{i,2[(0, 1; x ), 0{i,2}(l,O;xo). 

The remaining potentials can be deduced from (0. If only the values of (ka) for 
x € S are needed, then it suffices to specify (pm(xo) and <f>\i}(xi; xo). 

Does ^-robustness in x imply any structural constraints on (/e^)? If ( k a) is 'R- 
robust in x for all x belonging to a set S, then the corresponding conditions imposed 
by Proposition [9] depend on S. In this section, we are interested in conditions that 
are independent of S. Such conditions allow to define sets of functional modalities 
that contain all "R-robust functional modalities for all possible sets S. If S (which 
will be the support of the input distribution in Section [5]) is unknown from the 
beginning, then the system can choose its policy within such a restricted set of 
functional modalities. To find results that are independent of S, our trick is to 
find a set M% of functional modalities such that (ka) can be approximated on S by 
functional modalities in M<r- The approximation will be independent of S. 

We first consider the special case % = % := {(R,xr) : R c [n], \R\ > k,XR e 
Xr}. For simplicity, we replace any prefix or subscript % by k. Denote by M^+i 
the set of all functional modalities (ka) such that there exist potentials 4>a of the 
form 

(f>A(XA',X Q )= 2_j aA,B^B{X A \B\XQ), 

BOA 
\B\<k+\ 

where cca,b £ R. and ¥3 is an arbitrary function R/^x^o — > R. The set M^+i is called 
the family of (k + \)-interaction functional modalities. Note that the functions 
do not depend on A. This ensures a certain interdependence among the functional 
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modalities ka- The name "(k + l)-interaction" comes from the fact that each poten- 
tial depends on the k (or less) variables in B plus the output variable Xq. Since 
Mu+i only contains strictly positive functional modalities, we are also interested 
in the closure of Mfc+i with respect to the usual topology on the space of matrices, 
considered as elements of a finite-dimensional real vector space. 

Example 11. The functional modalities ©, derived from the classical model (O 
of a neural network, belong to M2. To illustrate the difference between M2 and its 
closure, consider the functional modalities (ka) with 



K A {X\,...,Xn,Xo) := 



e 1 - 



(ZisA Wt xi - rj) x Q 



If w\, . . . , w„ and rj are fixed and B — > 00, then 



+ e"2 



rj) 



(8) 



where 



K A {x u x n ; +1) -> WjXi - 77), 



ieA 



6(x) = 



+ 1, 

2' 

0, 



if x > 0, 
if x - 0, 
if x < 0. 



The functional modalities © are deterministic limits of the probabilistic model (f3]>, 
called linear threshold functions. They lie in the closure of Mj, but not in itself. 

Linear threshold functions are widely used as elementary building blocks in net- 
work dynamics, for example to build simple models of neural networks, metabolic 
networks or gene-regulation networks. Robustness against knockouts of such net- 
works has been studied in [2], exploring the example of the yeast cell cycle. 

Let Mk+\ be the set of strictly positive functional modalities (ka) such that 



(9) k c (x c ;x ) 



1 



^C,x c 



■ exp 



See 
V|B|=* 



^C.xc 



WO?) 



W kb(x c \b;x ) 



bqc 

V\B\=k 



for all C c [n] with |C| > k, where Zc, Xc i s a normalization constant that ensures 
that Kc(xc) is a probability distribution. Note that equations © can be used to 
parametrize the set Mk+i- The stochastic maps ka with |A| < k can be chosen 
arbitrarily, while all other stochastic maps kq with |C| > k can be computed by 
normalizing the geometric mean of the stochastic maps Kb for B c C and \B\ = k. 

Lemma 12. Mk+i is a subset of M^+h It consists of those functional modalities 
(ka) where the coefficients cia,b additionally satisfy 



(-D W a A ,B 



(-1) |A 'W,B, 



and 



{-\f\A'\a A ,B = (-l) A \A\a A ',B, 
for all xb e Xb and xq e Xq. 



whenever B c A n A' and \B\ < k, 



whenever B c A n A' and \B\ = k. 



to 
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Proof. Assume that the coefficients a^B of (ka) £ Afjt+i satisfy the conditions 
stated in the lemma. We may multiply all functions by scalars and assume 

(10) a A , B = (-l)^, if\B\<k, a AM = (-l) w - k -^- if \B\ = k. 

\A\ 

Then \n{Kc{xc\ xq)) equals the logarithm of the normalization constant plus 



E 

AcC 



J](-1) |AHB|v Pb(*cIb;*o) + Xi ( " 1)IAh ^ ,FB( ^ clB;xo) 



BCA 
\B\<k 



IRI 



BCC \RcC\B 
\B\<k 



BCA 
\B\=k 



^B{Xc\B\ Xq) 



+ Z 



BCC \RCC\B 
\B\=k 



\R\ + k 



¥b(xc\b', Xq) 



^BiXcWiXQ) 



-2 ZH)fV1 

BCC V /=0 V '/ 

|B|<* 

Bccl/=0 V ' '' + 



^b^cIb^o) 



(11) 



= ^] ^ICLIBI^bC^cIb; *o) + ^ ^jr^sC^clB; *o) , 

BCC BCC ( k ) 

|B|<* |B|=A 



where the identity £i=o(;)^+T = (( m + ^Cm-i 1 )) was used and denotes 
Kronecker's delta. For \C\ > k the first sum is empty, and it follows that kq satisfies 
the defining equality of Mk+i- 

Conversely, if (ka) € Mk+u then let cca,b be as in (fTOl) . and let 



Yb&b] Xq) = \og(K B (x B ; Xq)), for all Xq £ Xq, X B € X B , \B\ < k . 

These coefficients oja,b and functions Wb together define an element (ka) € M^+i- 
The calculation (fTTT) shows that 



r i 



£a(-*aUo) 



j— expC¥A(x A ;xo) = k a (x a ;xq), 



if \A\ < k, 



jl— exp ( £bca ^ ln(K B (x A | B ; x )| , if |A| > k, 



\B\=k \k 

and so (ka) = (ka) belongs to M^+x and is of the desired form. □ 

Theorem 13. Let (ka) be functional modalities. Then there exist functional modal- 
ities (ka) in the closure ofMk+i such that the following holds: If (ka) is k-robust on 
a set S c X{ n , then ka(x\a) = i<A(x\A)for all A c [n] and all x e S. In particular, 
(ka) belongs to the closure of the family of(k + \)-inter actions. 
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Proof. Define (k A ) via 



k a (x a ;xo) = 



k a {x a ;x ), 



7— ill boa k b (x a \ b ;xo) 

' XA \ \B\=k 



if \A\ < k, 
else, 



where Z A>XA is a normalization constant. By definition, (k A ) lies in the closure 
of Mk+i- Let x e S and C c [n]. If |C| < k, then i<c(x\c) = Kc(x\c) by definition 
of k a . So assume that |C| > k. By definition of ^-robustness, if x e S, then 
K c(x\c) - kb(x\b) for all B c C with |B| - k. Therefore, if x e S and |C| > k, then 



*t(*lc;*o) = 



\~[ kb(xc\b',x ) 



BQC 
\B\=k 



Therefore, if x e S and \C\ > k, then Zc tX \ c = 1 an( l fcWc) - *tWc)- D 

Since Mjt+i and Mk+i are independent of S, Theorem [13] shows that these two 
families can be used to construct robust systems, when the set S is not known 
a priori but must be learnt by the system, or when S changes with time and the 
system must adapt. 

If we are not interested in all functional modalities but just the stochastic map 
k describing the unperturbed system, we can describe k in terms of low interaction 
order. The family of(k + \)-interaction stochastic maps, denoted by K^+i, consists 
of all strictly positive maps k such that 



ln/<(x;xo)= ^ ¥a(*U;*o) 



Ac[n] 
\A\<k 



for some real functions : Ka 



Corollary 14. Let k be a stochastic map. For given k there exists a stochastic map 
i< in the closure of K^ + \ such that the following holds: If k is k-robust on a set S, 
then k(x) = i<{x)for all x € S. 

Proof. If k is ^-robust on S, there exist functional modalities (k a ) a with k = K[ n ] . 
Choose (k a ) as in Theorem [T3l If x e S, then k(x) = K[ n ](x) = K[ n ](x). Hence the 
Corollary holds true with ii = &[„] . □ 

Example 15. The functional modalities (@} do not lie in M%. This does not mean 
that neural networks are not robust: In fact, it is possible to naturally redefine the 
functional modalities (0]) such that the new functional modalities lie in M2. 

The construction (O identifies the summand WiXiXo with <p$. Now we will make 
another identification: For each i e [n] let 

1 

K{i){Xi\ XO) - — exp(w Wi Xi XQ-T]). 

The unique extension of these stochastic maps to functional modalities (k a ) in M2 
is given by 

j/iai 



(12) k a (x\ a ; x ) = I \\ x ) 



71 

A,x\ A \ieA 



1 



■ exp 



n v-i 

1 1 ieA 
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where Z A)X \a and Z' A , are constants determined by normalization. The functional 
modalities defined in this way lie in Mi, and the stochastic map K[ n ] agrees with ([3J. 
Note that, by tuning the parameters w\, . . . , w n , any combination of stochastic maps 
is possible for *i, . . , , k„. This shows that any element of M2 has a representation 
of the form O- 

As in example [TT1 we can scale the weights w,- and the threshold rj by a factor of 
/? and send /? — > +00. This leads to the rule 

(13) k a (x a ; +1) -> ^ WiX; - 77), 



;"e/\ 



which is a normalized variant of ®. 

The rule (fT2l) implements a renormalization of the effect of the remaining inputs 
under knockout. Similar renormalization procedures are sometimes used when 
training neural networks using Hebb's rule. Usually the total sum of the weights 
X,- Wj is normalized to not grow to infinity. The rule (fl2l) suggests that under knock- 
out all remaining weights are amplified by a common factor. 

The ideas leading to Theorem [13] can be applied to more general robustness 
structures R as follows: For any x e X let 

\{R c [n] : (R, x\ R ) e K}, if there exists # c [n] with (#, x| R ) e R, 
}{[«]}, else, 



7?v 



and let < R^ m be the subset of inclusion-minimal elements of % x . If (ka) is ^-robust 
in S, then 

k(x; xq) = kr(x\r; xq) for any R e < R™ m , x € S 



and hence 



k(x; x ) = 



J - [ ^(x| R ;x ) 



For any C c [n] let Rf m {C) = [R e K™ m : C c /?}. If <R is coherent, then we can 
deduce 

/ y/iftf"(oi 
(14) k c {x\c\xq)= Y\ kr(x\r;x ) 

,ReKf n (C) 

for all x € 5 with 7?™ n (C) ± 0. This motivates the following definition: Denote by 
M<r the set of all strictly positive functional modalities that satisfy 



K C (x\ c ;xo) 



1 



]~ [ a: r (x| s ;xo) 

Re<RT(C) 



for all x £ X and all C c [n] with *R™ n (C) * 0, where Z c , x \ c is a suitable normal- 
ization constant. The same proof as for Theorem [TBI implies : 

Theorem 16. Let (ka) be functional modalities, and assume that R is coherent. 
Then there exist functional modalities (ka) in the closure of M<r such that the fol- 
lowing holds: If (ka) is < R-robust on a set S c X, then 

K A {x\ A ) = k a {x\ a ) for all x e S. 
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As a generalization of Lemma[T2l we can also describe M<r as a set of functional 
modalities with limited interaction order. To simplify the presentation, we assume 
that H is saturated, by which we mean the following: If (R,xr) € % for some 
xr € Xr, then (R, x' R ) e % for all xr e Xr. In other words, a saturated robustness 
specification is given by enumerating a family of subsets of [n]. For example, 
the robustness structures Kk are saturated, while the robustness structures defining 
canalyzing and nested canalyzing functions (see Section |3]) are not saturated. If K 
is saturated, then % x and 7?™ n are independent of x e X. 

Consider the family 

A - (C c [ n ] : C c R for some R e ft™ 11 and x e X) , 



and let A(C) = {R e A : /? c Q. Let M A be the set of all functional modalities (k a ) 
such that there exist potentials <pA of the form 

(15) (f> A (xA',xo)= 2j ^a.b^b^aIb;^)), 

BeA(A) 

where oa„b £ R and ^r is an arbitrary function R^ 8 ** — > R. We call Ma the family 
of ^-interaction functional modalities. Note that the functions do not depend 
on A. This ensures a certain interdependence among the functional modalities ka- 

Lemma 17. Assume that K is coherent and saturated. M% is a subset of "Ma- 

Proof. If 7?* = 0, then A contains all sets. The Mobius inversion formula shows 
that M A contains all strictly positive functional modalities. Therefore, we may 
assume that K x + 0. 

Define Gibbs potentials using the Mobius inversion ©. If x e S and A is large 
enough such that ( Rf\A) + 0, then 

CCA CCA ReK^tC) 

CeK x Ce<R x 1 1 

IMC| V lnK B (x\ B ;x ) 

|^™ n (C)| ^ 

Together with © this gives 

<Pa(x\ a ,xo) - ^ a A ,c ^k c (x\c;x ) + ^ cka,c In k c (x\ c ;x q ), 

CCA CCA 

CeK'f" 

where 

|'(_l)|A|-iq > ifCgK*, 




»A,C - i v / , \|A|-[R|-t 1 If e <K> 



mm 



\-Rf m (BUR)\ 

This is clearly of the form (fT5l) . □ 

In the case K = K k the sum Zrca\b(-1) |A|_|R| "* : ^s^jj that appears in the 
proof of Lemma[T7]can be solved explicitly, resulting in the statement of Lemma[T2l 
In the general case this is not possible. 
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Corollary [14] also generalizes. Let A be as above. The set of ^-interactions 
stochastic maps consists of all strictly positive stochastic maps k such that 

ln/e(x; x ) = V ¥a(*U;*o) 



AeA 



for some real functions *¥a : 



Corollary 18. Lef k be a stochastic map, and let R be a coherent and saturated 
robustness specification. There exists a stochastic map k in the closure of such 
that the following holds: If k is R-robust on a set S, then k(x) - k(x)for all x e S. 

The proof is the same as the proof of Corollary [T4) 

Remark 19. Instead of representing functional modalities as a family (ka) of sto- 
chastic maps, it is possible to use a single stochastic map k, operating on a larger 
space, that integrates the information from the family (ka). The stochastic map k 
can be constructed as follows: For each i = 1, . . . , n let a, be the disjoint union of 
Xi and one additional element, denoted by 0. This additional state represents the 
knockout of X,-. Let X m - X\ X • • • X X n . For each y e Xi n let supp(y) = {i : y,- + 0}. 
We define the stochastic map k : Xo x Xi B via 

k(x;Xo) = KsuppwMsuppCr); xo). 

This construction gives a one-to-one correspondence between functional modali- 
ties and stochastic maps from to Xo- 

As an example, consider the functional modalities defined in ((U). In this exam- 
ple, the construction of k is particularly easy: It just amounts to extending the input 
space to {-1, 0, +1}". Equation (f3]> remains valid for k. The construction is more 
complicated for the functional modalities (TT2l ). 

More generally, any Gibbs representation for functional modalities (ka) as in (O 
extends to a Gibbs representation of k: For any B c [n], i e ,Y and x e X m let 



<f> B (x, xo) = 



\<Pb(x\ b ,Xq), if supp(x) c B, 

0, else. 



Then 



k(x; x ) 



5. Robustness and conditional independence 

Given the probability distribution p m of the input variables and a stochastic 
map k describing the system, the joint probability distribution of the complete sys- 
tem can be computed from 

p(xo, x) = k(x; xo)pm(x), for all (xo, x) e X, 

As shown in Proposition [2j robustness of stochastic maps is related to conditional 
independence constraints on the joint distribution. In this section we study the set 
of all joint distributions that arise from robust systems in this way. 

Let R be a robustness specification. By Proposition [2j the stochastic map k is 
^-robust on supp(/?; n ) if and only if for all (R, xr) € R the output Xq is (stochasti- 
cally) independent of given that Xr = xr. In the following, this conditional 
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independence (CI) statement will be written as Xo 11 X[„-]\r | Xr = xr . This moti- 
vates the following definition: A joint distribution p is called %-rohust if it satisfies 
Xq 11 X[ n }\ R \Xr = x r for all (R, x R ) e K. We denote by P-r the set of all ^-robust 
probability distributions. 

The single conditional independence statement Xo 11 X[„]y? | Xr - xr means 
that the conditional distributions satisfy 

P(Xq - xq I Xj n = x) = p(Xq = xo | Xr = xr), for all x € X{ n with p(x) > 

and x\r = xr . 

It is often convenient to use another definition that avoids the need to work with 
conditional distributions: The statement Xo 11 X[„]\# \ Xr = xr holds if and only if 

(16) p(x , X S , X R )p(x' , X s , Xr) = p(x , Xg , X R )p(x' , X S , Xr), 

for all xo, x'q e Xo, xs,x' s € Xs and xr e Xr. Here, p(xq, x$ , xr) is an abbreviation 
of p(Xo = xo,Xs = xs,Xr = xr). It is not difficult to see that these two definitions 
of conditional independence are equivalent. The formulation in terms of determi- 
nantal equations is used in algebraic statistics [4] and will also turn out to be useful 
here. 

A joint probability distribution p can be written as a do x |^Vi n |-matrix. Each 
equation (fTBT ) imposes conditions on this matrix saying that certain submatrices 
have rank one. To be precise, for any edge (x, x') in the graph G'r (defined in Sec- 
tionO equations (fTBT ) for all xo, x' € Xo require that the submatrix (pkz)kex ,ze{x,x'} 
has rank one. For any x e Xi n denote by p x the vector with components p x (xo) = 
p(Xo = xo,X[ n = x) for xo £ Xo- Then a distribution p lies in V<r if and only if 
p x and p y are proportional for all edges (x,y) of G^. Observe that p x and p y are 
proportional if and only if either (i) one of p x and p y vanishes or (ii) k(x) = k(j>). 
This observation allows to reformulate the equivalence (1) o (3) of Proposition [2] 
as follows: 

Lemma 20. Let S - {x e X{ n : p x ± 0}. A distribution p lies in P<r if and only if 
p x and p y are proportional whenever x,y € S lie in the same connected component 
ofG %s . 

For any family B of subsets of X m let be the set of probability distributions 
p on X that satisfy the following two conditions: 

(1) UB - {x e * in : p x ± 0}, 

(2) p x and p y are proportional, whenever there exists X, £ B such that ijeZ- 

Then V% = Ub^b* where the union is over all 'K-robustness structures B. The 
disadvantage of this decomposition is that there are 'K-robustness structures B, B' 
such that Ps is a subset of the topological closure Pr> of Pr> . In other words, each 
p e Pq can be approximated arbitrarily well by elements of Ps> , and therefore in 
many cases it suffices to only consider P^< . The following definition is needed: 

Definition 21. An 'R-robustness structure B is maximal if and only if UB := 
(J^gg satisfies any of the following equivalent conditions: 

(1) For any x e X m \ UB there are edges (x, y), (x, z) in G<r such that y, z £ UB 
do not lie in the same connected component of G^ub- 

(2) For any x e X[ n \ UB the induced subgraph G-^ubum has fewer connected 
components than G«,ub- 



16 



JOHANNES RAUH AND NIHAT AY 



Lemma 22. P% equals the disjoint union IJ B P%, where the union is over all "R- 
robustness structures. Alternatively, P% equals the (non- disjoint) union Ub^b, 
where the union is over all maximal K-robustness structures. 

Proof. The first statement follows directly from the above considerations. To see 
that it suffices to take maximal ^-robustness structures in the second decomposi- 
tion, consider an ^-robustness structure B that is not maximal. By definition there 
exists x € X m \ UB such that the induced subgraph G^,uBu|x} has at least as many 
connected components as G^ub- Let B' be the family of connected components 
of Gk,uBu|x}- If G^,uBu{jri has the same number of connected components as G^ ;U b, 
then there is J/ e B such that J/ U {x} e B', otherwise let J/ e B be arbitrary. Let 
y € J/. For any p e Pr and e > define a probability distribution p e via 



p e (x Q , z) = 



p(xq,z), i£z<£{x,y}, 
(1 - e)p(x ,x), if z = y, 
ep(x Q ,x), ifz = ;<c. 



Then p e e P&, and hence Pr c P b ,. If B' is not maximal, we may iterate the 
process. □ 

The following lemma sheds light on the structure of 

Lemma 23. Fix an "R-robustness structure B. Then Pq consists of all probability 
measures of the form 



(17) p(X = x ,X in = x) 



(p(Z)A z (x) Pz (x ), ifxeZe B, 
jo, ifx e \ UB, 



where p is a probability distribution on B and A z is a probability distribution on 
Zfar each e B and (pz)ze.B is a family of probability distributions on Xq. 

Proof. It is easy to see that (fTTl) defines indeed a probability distribution. By 
Lemma l20l it belongs to Pr. In the other direction, any probability measure can be 
written as a product 

p(xo,xi,...,x n ) = p(Z)p(xu...,x n \(X l ,...,X n ) e Z)p(xq\xi,...,x„), 

if (jci, . . . , x n ) £ Z £ B, and if p is an ^-robust probability distribution, then 
Pz(xq) :— p(xq\xi, ...,x n ) depends only on the block Z in which (xi, . . . , x n ) lies. 

□ 

Lemma |22] decomposes the set P<r of robust probability distributions into the 
closures of the smooth manifolds Ps, where B runs over the maximal "^-robustness 
structures. Lemma [23] gives natural parametrizations of these manifolds. 

By comparison, Theorem [16] and LemmafTTIdescribe robustness from a different 
point of view. The result can be translated to the setting of this section as follows: 

Corollary 24. Suppose that K is a coherent and saturated robustness structure, 
and define A as in Section [?] If p € Pq, then there exists a stochastic map k £ 
such that p{xo\x) - k(x; xo)for all x £ UB. 

In the statement of the corollary note that p{X m = x) > for all x e UB, and 
hence the conditional distribution p(xo\x) is well-defined in this case. 
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Corollary |24] can also be viewed from the perspective of hierarchical models: 
Let A = {{1, ...,«}} U {S U {0} : 5 e A}. The hierarchical loglinear model 
consists of all probability distributions p on X of the form 



where <pA is a real function with domain Xa- By the results of this section, £^ is a 
smooth manifold containing P<r in its closure. See ifTTl Rl for more on hierarchical 
loglinear models. 

Remark 25. It is also possible to derive the decomposition in Lemma 1221 from re- 
sults from commutative algebra. Since the equations (fTBT ) that describe conditional 
independence are algebraic, they generate a polynomial ideal, called conditional 
independence ideal. In this case the ideal is a generalized binomial edge ideal, 
as defined in Ifl3l . For such ideals, the primary decomposition is known and corre- 
sponds precisely to the decomposition of the set of robust distributions as presented 
in Lemma|22l The parametrization of Lemma[23]can be considered as a surjective 
polynomial map and shows that all components of the decomposition are rational. 



In this section we consider the symmetric case H = K^. As above, we replace 
any prefix or subscript H by k. 

If k - 0, then any pair (x,y) is an edge in Go- This means that any 0-robustness 
structure B contains only one set. There is only one maximal 0-robustness structure, 
namely B = {Xi n }. The set Kq is irreducible. This corresponds to the fact that Pq 
is defined by Xo iL Xj n . 

B is actually a maximal ^-robustness structure for any < k < n. This illustrates 
the fact that the single CI statement Xq IL X; n implies all other CI statements of the 
form Xo IL X[ n ]\R | Xr = xr . The corresponding set P-^ contains all probability 
distributions of P^ of full support. 

Now let k - 1 . In the case n = 2we obtain results by Alexander Fink, which can 
be reformulated as follows Q: Let n = 2. A \-robustness structure B is maximal 
if and only if the following statements hold: 

• Each B € B is of the form B = S i x S 2, where S\ c X\ , S2 £ <^2- 

• For every x\ € X\ there exists fieB and x-i £ X2 such that {x\,X2) € B, 
and conversely. 

In (21 a different description is given: The block S 1 x 5 2 can be identified with 
the complete bipartite graph on S 1 and 52- In this way, every maximal 1 -robustness 
structure corresponds to a collection of complete bipartite subgraphs with vertices 
in Xy U X2 such that every vertex in X\ and X2, respectively, is part of one such 
subgraph. Figure |2] shows an example. 

This result generalizes in the following way: 

Lemma 26. A l-robustness structure B is maximal if and only if the following 
statements hold: 

• Each B € B is of the form B = S 1 X • • • X S n , where 5; £ X[. 

• Us iX -xS„eB S i = X if° r aU i 6 W 




ACA 
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Figure 2. A 1 -robustness structures for two variables, a) The 
graph Gi s- b) The representation in terms of bipartite graphs. 

Proof. Suppose that B is maximal. Let J/ e B and let 5, be the projection of 
J/ c X in to Xi. Let J/' = S i X- • • xS n . Then J/ c J/'. We claim that (B \ {J/})U{J/' } 
is another 1 -robustness structure with the same number of components as B, and 
by maximality we can conclude J/ = if'. By Definition [3] we need to show that 
G%y> is connected and that G^^uj/' is not connected for all X, £ B \ {J/}. The first 
condition follows from the fact that G<rj/ is connected. For the second condition 
assume to the contrary that there are x e J/' and y e X, such that x - (x\, . . ., x n ) 
and y = (yi,. . ■ ,y n ) disagree in at most n - 1 components. Then there exists a 
common component x\ = yj. By construction there exists z = (zi, . . . ,Z n ) £ M such 
that zi = yi = X], hence J/ U "Z. is connected, in contradiction to the assumptions. 
This shows that each if has a product structure. 

Write J/ = Sf x • • • x Sf for each J/ e B. Obviously Sf n sf = for all 
/ e [«] and all J/, -2 e B if J/ ^ -2. For the second assertion, assume to the 
contrary that / e <Y, is contained in no Sf. Take any J/ e B and define J/' := 
Sf X • • • X (Sf U {/}) x • • • x Sf r . Then (B \ {J/}) U {J/'} is another 1-robustness 
structure with the same number of components as B, contradicting the assumptions. 

Conversely, assume that B is a 1-robustness structure satisfying the two asser- 
tions of the theorem. For any x e Xm \ UB there exist y\,...,y n e UB such that 
x\ = yi,. . .,x n = y n . Since x £ UB the points y\, . . . ,y n cannot all belong to the 
same block of B. If y,- and yj belong to different blocks of B, then the two edges 
(x,yi) and (x,yf) of Gi show that B is maximal. □ 

The last result can be reformulated in terms of n-partite graphs generalizing : 
Namely, the 1-robustness structures are in one-to-one relation with the n-partite 
subgraphs of M^,..,,^ such that every connected component is itself a complete 
n-partite subgraph M eij ,.. t e n with e\ > for all i e [n]. Here, an n-partite graph is a 
graph which can be coloured by n colours such that no two vertices with the same 
colour are adjacent. 

Unfortunately the nice product form of the maximal 1-robustness structures does 
not generalize to k > 1 : 

Example 27 (Binary inputs). If n = 3 and d\ = d2 = = 2, then the graph G2 is 
the graph of the cube. For a maximal 2-robustness structure B the set X m \ UB can 
be any one of the following (see Fig. [3]>: 

• The empty set 

• A set of cardinality 4 corresponding to a plane leaving two connected com- 
ponents of size 2 

• A set of cardinality 4 containing all vertices with the same parity. 
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a) 



b) ff c) • 



d) 



Figure 3. The four symmetry classes of maximal 2-robustness 
structures of three binary inputs, see Example [27] 



Figure 4. A maximal 3-robustness structure for four binary inputs. 




Figure 5. The 2-robustness structure from Example[281 The graph 
G2 is the graph of a hypercube of dimension four, where diagonals 
have been added to the two-dimensional faces. Only the edges of 
G2 that connect vertices of Hamming distance one are shown, and 
the edges of G2,uB- The two blocks are marked in green and red. 



• A set of cardinality 3 cutting off a vertex. 
In the last case only the isolated vertex has a product structure (Fig. [4jl). 

If n = 4 and d\ - d% = d^ = d% = 2, then the graph G3 is the graph of a 
hyper-cube. Figure [4] shows how a maximal 3-robustness structure can look like. 

^-robustness implies (k + l)-robustness, and therefore V\ c ^+1- This does not 
mean that all ^-robustness structures are also (k + l)-robustness structures, for the 
following reason: If B is a ^-robustness structure and S = UB, then G^+i^ ma y 
have more connected components than G^ 

Example 28. Consider n = 4 binary random variables X\,.,.,X^. Then 

B ■- {{(1, 1, 1, 1), (2, 2, 1, 1)}, {(1,2, 2, 2), (2, 1, 2, 2)}} 

is a maximal 2-robustness structure. Both elements of B are connected in G2, but 
not in G3, see Fig. [5] 

Nevertheless, the notions of /-robustness and ^-robustness for / > k are related 
as follows: 

Lemma 29. Assume that d\ = ■ ■ ■ = d„ - 2, and let B be a maximal k-robustness 
structure of binary random variables. Then each B € B is connected as a subset of 
G s for all s < n — 2k + 1. 

Proof. We can identify elements of Xi n with binary strings of length n. Denote by 
I r the string 1 ... 10 ... of r ones and n - r zeroes in this order. Without loss of 
generality assume that Iq,Ii are, two elements of B € B, where k < n - I < s < 
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n - 2k + 1. Then / > 2k, and hence > k. Let m = We will prove that we 
can replace B by B U {/,„} and obtain another ^-robustness structure. By maximality 
this will imply that Iq and are indeed connected by a path in G s . 

Otherwise there exists A e B, A + B, and x e A such that x and I m agree in 
at least k components. Let a be the number of zeroes in the first m components 
of x, let b be the number of ones in the components from m + 1 to / and let c 
be the number of ones in the last n - I components. Then I m and x disagree in 
a + b + c < n-k components. On the other hand, x and Iq disagree in (m - a) + b + c 
components, and x and // disagree in a+((l-m)-b)+c < a+(m-b)+c components. 
Assume that a > b (otherwise exchange Iq and // in the following argument). Then 
x and /() disagree in at most m + c <\^ \ + n - I = n - \_^\ < n - k components, so 
A U B is connected, in contradiction to the assumptions. □ 
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